home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
BCI NET
/
BCI NET Dec 94.iso
/
archives
/
telecomm
/
bbs
/
d342.lha
/
HACK3.man
< prev
next >
Wrap
Text File
|
1992-04-14
|
22KB
|
448 lines
/************************************************************************/
/* hack.doc */
/************************************************************************/
/************************************************************************/
/* History */
/* */
/* 85Apr27 HAW Partially updated for Citadel-86. */
/* 82Dec07 CrT Completed. */
/* 82Nov23 CrT Created. */
/************************************************************************/
/************************************************************************/
/* Audience */
/* */
/* Folks trying to read or modify the Citadel code. */
/************************************************************************/
/************************************************************************/
/* Purpose */
/* */
/* Explain the basic data structures and algorithms. */
/************************************************************************/
/************************************************************************/
/* Overview */
/************************************************************************/
The fundamental structure of the system is very simple. CtdlMsg.sys
is a circular file. New messages are simply written around the buffer
in an endless circle, overwriting old messages in their way. There is
no other way of deleting messages, and text is never shuffled on disk.
Messages are numbered consecutively and start with an FF (hex)
byte. Except for this FF start-of-message byte, all bytes in the message
file have the high bit set to 0. "This means that in principle it is
trivial to scan through the message file and locate message N if it
exists, or return error. (Complexities, as usual, crop up when we
try for efficiency...)
Each room is basically just a list of message numbers. Each time
we enter a new message in a room, we slide all the old message-numbers
down a slot, and probably the oldest one falls off the bottom. Reading
a room is just a matter looking up the messages one by one and printing
them out. If the message has been overwritten already, we just ignore it.
Implementing the "new message" function is also trivial in principle:
we just keep track, for each caller in the userlog, of the highest-numbered
message which existed on the >last< call. (Remember, message numbers are
simply assigned sequentially each time a message is created. This
sequence is global to the entire system, not local within a room.) If
we ignore all message-numbers in the room less than this, only new messages
will be printed. Voila! Code up the system described thus far, and
you'll have a good approximation to Version 1. Better stop and reread
everything to here, so you can pick out the fundamental mechanisms among
all of Version 2's bells and whistles.
/************************************************************************/
/* message format on disk (ctdlMsg.sys) */
/************************************************************************/
Message format has changed relative to V1, in the direction of using
more disk space and providing greater flexibility.
A message now consists of a sequence of character strings. Each string
begins with a type byte indicating the meaning of the string and is
ended with a null. All strings are printable ASCII: in particular,
all numbers are in ASCII rather than binary. This is for simplicity,
both in implementing the system and in implementing other code to
work with the system. For instance, a database driven off Citadel archives
can do wildcard matching without worrying about unpacking binary data such
as dates first. To provide later downward compatability,
all software should be written to IGNORE fields not currently defined.
/************************************************************************/
/* The type bytes currently defined are: */
/************************************************************************/
BYTE Mnemonic Comments
0xFF Start-of-message indicator. Followed by local
message ID# as ASCII string, null-terminated as
always. This byte is the >only< byte which has
the high bit set in a Citadel message.buf file.
This field must be present in every message.
A Author Name of originator of message.
D Date Date message was created.
C Time Time message was created.
M Message Text of message. Is last field in a message, by
definition. Following data will be ignored.
This field must be present in every message.
N Name Human name for node originated on. Used on
title line of foreign messages. Ex:
ODD-DATA
will produce a title message something like
82Nov23 from Cynbe ru Taren @ODD-DATA
O Origin ID of node message originated on: Country code plus
phone number of system. (Not stored for locally
originated messages.) Ex:
US 206 633 3282
R Room Room of origin. Topic.
S Source ID# Message ID # on system message was created on.
Two 16-bit integers (high and low halves of
full 32-bit ID#) separated by a blank. Ex:
0 13654
T To Addressee. Used only for private messages in Mail>,
in version 2.00 .
EXAMPLE
Let <FF> be a 0xFF byte, and <0> be a null (0x00) byte. Then a message
which prints as
LOGLAN> read new
82Nov04 From James Cooke Brown
Loi uiue i Ti logla
LOGLAN>
might be stored as
<FF>0 3583<0>D82Nov04<0>AJames Cooke Brown<0>RLOGLAN<0>MLoi uiue i Ti logla<0>
|--Local ID--|---Date---|-----Author---------|--Room---|-------Message--------|
The date, room and author fields could be in any order. Not all fields
are printed by default. The local ID# and Room field are suppressed here.
An isolated system will not normally have use for fields beyond those
used in this example.
Lines are marked with C NewLine (ASCII LF) characters, within the message
text proper.
/************************************************************************/
/* Networking */
/************************************************************************/
Citadel nodes network by sharing one or more rooms. Any Citadel node
can choose to keep an image of any public room on any other Citadel node
(subject to willingness to foot the phone bills, of course!). The
procedure in essence simply involves calling the imaged node up periodically
and copying any new messages in the imaged room into the local image.
There is no necessary reciprocity or pre-arrangement, although convenience
and politeness respectively suggest both. The node which gets the
information foots the phone bill for the transaction. This promotes
simple and harmonious relations between the nodes.
Complexities arise primarily from the possibility of densely connected
networks: one does not wish to accumulate multiple copies of a given
message, which can easily happen. Nor does one want to see old messages
percolating indefinitely through the system.
This problem is handled by a simple brute-force mechanism: each node
keeps a list of all messages it has seen recently, recording origin
system, creation date, and original ID#. When downloading, messages
which have already been seen, or which are too old to be remembered,
are skipped. Messages can percolate outward through a large network
with no global routing or control, but do not reproduce wildly or
cycle indefinitely.
The above discussion should make the function of the new
fields reasonably clear:
o Every message needs a local ID#, so the system can determine if
a given caller has seen it before.
o Travelling messages need to carry system of origin, date of
origin, and original ID# with them, to keep reproduction and
cycling under control.
(Uncoincidentally) the format used to transmit messages for networking
purposes is precisely that used on disk, except that lines are marked
with ASCII CR characters in stead of ASCII LF characters. The current
distribution includes CTDLNET, which is basically a database integrator;
please see CTDLNET.DOC on its operation and functionality (if any).
/************************************************************************/
/* portability problems */
/************************************************************************/
Check all I/O, modem, console, and file stuff, and especially the
dPrintf and mPrintf functions.
/************************************************************************/
/* "Room" records (ctdlRoom.sys) */
/************************************************************************/
The rooms are basically indices into ctdlMsg.sys, the message file.
As noted in the overview, each is essentially an array of pointers into
the message file. The pointers consist of a 16-bit message ID number
(we will wrap around at 64K for these purposes) together with a 16-bit
psuedo-sector offset within ctdlMsg.sys telling us where the message begins.
Since messages are number sequentially and written circularly, the
set of messages existing in ctdlMsg.sys will always form a continuous
sequence at any given time. Thus, by remembering the ID numbers of the
oldest and newest messages in the message file, we can check to see
if a message exists >before< going to disk, saving ourselves (and the
disk drive) the pain of futile seeks in search of the nonexistent.
This information is recorded in oldest and newest, 32 bit numbers.
You'll be seeing more of these...
The newest is simply incremented each time we enter a
new message in the message files. Oldest is incremented
each time we overwrite an FF (start-of-message) byte in the course
of writing a new message into the files. This corresponds to dead
reckoning -- current code never checks to see that the message number
of the message we are overwriting is what we think it is. In a garbaged
file with extra FF bytes around, this could cause oldest to
count too rapidly, eventually perhaps overtaking newest,
at which time the system will look completely empty. If you suspect
something like this is going on, just use configur.exe to rebuild
accurate numbers.
That should be enough background to tackle a full-scale room. From
ctdl.h :
struct {
unsigned char rbgen; /* generation number of room */
struct rflags rbflags; /* same bits as flags above */
char rbname[NAMESIZE]; /* name of this room */
char rbdisk; /* disk this room's files are in */
char rbdirname[9]; /* user directory for room's files */
struct {
ulong rbmsgNo; /* every message gets unique# */
ulong rbmsgLoc; /* sector message starts in */
} msg[MSGSPERRM];
} roomBuf;
[Note that all components start with "rb" for roomBuf, to make sure we
don't accidentally use an offset in the wrong structure. Be very careful
also to get a meaningful sequence of components --
C86 provides no checking on this sort of stuff either.]
Rbgen handles the problem of rooms which have died and been reborn
under another name. This will be clearer when we get to the log file.
For now, just note that each room has a generation number which is
bumped by one each time it is recycled.
Rbflags is just a bag of bits recording the status of the room. The
defined bits are:
Bit 0: INUSE. 1 if the room is valid, 0 if it is free for re-assignment.
Bit 1: PUBLIC. 1 if the room is visible by default, else 0.
Bit 2: MSDOSDIR. 1 if the room is a window onto some disk/userspace, else 0.
Bit 3: PERMROOM. 1 if the room should not be recycled even if empty, else 0.
(Lobby>, Mail> and all CPMDIRs are automatically permanent.)
Rbname is just an ASCII string (null-terminated, like all strings)
giving the name of the room.
Rbdisk and rbdirname are meaningful only in MSDOSDIR rooms, in which case
they give the disk (0 == A:, 1==B: ... 15==P:) and directory name (sysop
selected) to window.
Finally, msg is the array of pointers into the message file. RbmsgNo
is the ID number of the message, and RbmsgLoc is the sector it begins
in. (For NIL, we stick the value -1 in RbmsgLoc.)
RoomTab is just a little index into ctdlRoom.sys, to keep us from
constantly pounding around on the disk looking for things. It basically
records the name and status of each room. It is 100% redundant with
the file itself... as all our indices are. (As all indices should be!)
Note that RoomTab is a significant consumer of RAM all by itself. It
is RAM well spent, but if you have to shave Citadel a few K to make
it fit your system, cutting the number of rooms a bit is one try.
The only field new to us in roomTab is rtlastMessage, recording the
most recent message in the room. When we are searching for rooms with
messages a given caller hasn't seen, we can check this number in RAM
and avoid unprofitable disk accesses.
/************************************************************************/
/* log records (ctdlLog.sys) */
/************************************************************************/
This is the fun one. Get some fresh air and plug in your thinking cap
first. (Time, space and complexity are the eternal software rivals.
We've got 256 log entries x about 500 messages spread over up to 128
rooms to worry about, and with floppies disk access time is important...
so perforce, we opt for lots of complexity to keep time and space in bounds.)
To understand what is happening in the log code takes a little persistence.
You also have to disentangle the different activities going on and
tackle them one by one.
o We want to remember some random things such as terminal screen
size, and automatically set them up for each caller at login.
o We want to be able to locate all new messages, and only new
messages, efficiently. Messages should stay new even if it
takes a caller a couple of calls to get around to them.
o We want to remember which private rooms a given caller knows
about, and treat them as normal rooms. This means mostly
automatically seeking out those with new messages. (Obviously,
we >don't< want to do this for unknown private rooms!) This
has to be secure against the periodic recycling of rooms
between calls.
o We want to support private mail to a caller.
o We want to provide some protection of this information (via
passwords at login) and some assurance that messages are from
who they purport to be from (within the system -- one shouldn't
be able to forge messages from established users).
Lifting another page from ctdl.h gives us:
struct logBuffer {
unsigned char lbnulls; /* # nulls to print after newline */
struct lflags lbflags; /* UCMASK, LFMASK, EXPERT */
unsigned char lbwidth; /* terminal width */
char lbname[NAMESIZE]; /* caller's name */
char lbpw[NAMESIZE]; /* caller's password */
int lbgen[MAXROOMS]; /* 5 bits gen, 3 bits lastVisit */
ulong lbvisit[MAXVISIT]; /* newestLo on last few calls */
ulong lbslot[MAILSLOTS]; /* for private mail */
ulong lbId[MAILSLOTS]; /* for private mail */
}
Looks simple enough, doesn't it? One topic at a time:
RANDOM CONFIGURATION PARAMETERS
These are in the first three fields in the record. Lbnulls gives the
number of nulls to print after a newline, for folks who like
simultaneous hardcopy. Or any remaining ASR33 aficionados calling up...
Lbwidth is the caller's screen width. We format all messages to this
width, as best we can. Lbflags is another bit-bag, recording
uppercase-only folks, people who need a linefeed after a carraige-return,
people who want to suppress the little automatic hints all through
the system, and people who like to see the time a message was created.
FINDING NEW MESSAGES
This is the most important. Thus, it winds up being the most
elaborate. Conceptually, what we would like to do is mark each
message with a bit after our caller has read it, so we can avoid
printing it out again next call. Unfortunately, with 256 log
entries this would require adding two sectors to each message... and
we'd wind up reading off disk lots of messages which would never
get printed. So we resort to an arcane mixture of approximation
and low animal cunning.
The approximation comes in doing things at the granularity of
rooms rather than messages. Messages in a given room are "new"
until we visit it, and "old" after we leave the room... whether
we read any of them or not. This can actually be defended: anyone
who passes through a room without reading the contents probably just
isn't interested in the topic, and would just as soon not be dragged
back every visit and forced to read them. Given that messages are
numbered sequentially, we can simply record the most recent message ID#
of each room as of the last time we visited it. With 128 rooms, this
would give us (for each user) an array of 128 integers, or 256 bytes.
This is still too much -- I'd like the complete log record for a user
to be 256 bytes or less, and we have other stuff to do yet.
So, we complicate a little more. We record in lbvisit[MAXVISIT] the
most recent message ID# in the system on each of the last six calls
or so. Now, for each room, we can just indicate which call we last
visited the room on. This takes 3 bits per room, which we stash in
the low three bits of lbgen[MAXROOMS]. Now we're down to
128 rooms x 3 bits (plus a few bytes in lbvisit[], of course),
which is quite reasonable.
Putting it all together, we can now compute whether a given room
has new messages for our current caller without going to disk at all:
> We get the lbgen[] entry for the room in question
> We mask off the lower 3 bits
> We use the result as an index into lbvisit[], getting the ID number
of the most recent message in the system as of the last time we
visited the room.
> We compare this with roomTab[].rtlastMessage, which tells us
what the most recent message in the room is currently.
REMEMBERING WHICH PRIVATE ROOMS TO VISIT
This looks trivial at first glance -- just record one bit per room per
caller in the log records. The problem is that rooms get recycled
periodically, and we'd rather not run through 256 log entries each
time we do it. So we adopt a kludge which should work 99% of the time.
As previously noted, each room has a generation number, which is bumped
by one each time it is recycled. As not noted, this generation number
runs from 0 -> 31 (and then wraps around and starts over). Thus, these
numbers take 5 bits to represent. By a miraculous coincidence, we have
exactly 5 bits left in the lbgen[] entries in the log records. [Anyone
familiar with "capability" pointers will be encountering deja vu here...]
When someone visits a room, we set the generation number in lbgen[]
equal to that of the room. This flags the room as being available.
If the room gets recycled, on our next visit the two generation numbers
will no longer match, and the room will no longer be available -- just
the result we're looking for. (Naturally, if a room is marked as PUBLIC,
all this stuff is irrelevant.)
This leaves only the problem of an accidental matchup between the two
numbers giving someone access to a Forbidden Room. We can't eliminate
this danger completely, but it can be reduced to insignificance for
most purposes. (Just don't bet megabucks on the security of this system!)
Each time someone logs in, we set all "wrong" generation numbers to be
one less than the actual generation of the room. This means that the
room must be recycled thirty-one times before an accidental matchup
can be achieved. (We do this for all rooms, INUSE or dead, public
or private, since any of them may be reincarnated as a Forbidden Room.)
Thus, for someone to accidentally be lead to a Forbidden Room, they
must establish an account on the system, then not call until some room
has been recycle thirty-one to thirty-two times, which room must be
reincarnated as a Forbidden Room, which someone must now call back
(having not scrolled off the userlog in the mean time) and read new
messages. The last clause is about the only probable one in the sequence.
The danger of this is much less than the danger that someone will
simply guess the name of the room outright...
SUPPORTING PRIVATE MAIL
Can one have an elegant kludge? This must come pretty close.
Private mail is sent and recieved in the Mail> room, which otherwise
behaves pretty much as any other room. To make this work, we store
the actual message pointers in lbslot[] and lbId[] in the caller's
log record, and then copy them into the Mail> room array whenever we
enter the room. This requires a little fiddling to get things just
right. We have to update roomTab[MAILROOM].rtlastMessage at login
to reflect the presence or absence of new messages, for example. And
MakeMessage() has to be kludged to ask for the name of the recipient
of the message whenever a message is entered in Mail>. But basically
it works pretty well, keeping the code and user interface simple and
regular.
PASSWORDS AND NAME VALIDATION
LogTab[] indexes ctdlLog.sys, giving us a quick way of finding people.
It is basically a chronologically sorted hash table. We keep a two-byte
hash of the name and password of each caller in RAM. When someone tries
to log in, we just whip through the table in order looking for matches
on the password hash and loading the matching logfile entry in. Bogus
hits are eliminated by the simple expedient of refusing to acknowledge
a new user who's name or password hashes on top of an existing user.
Computer chauvinism at it's best...
This makes it difficult to forge messages from an existing user. (Fine
point: nonprinting characters are converted to printing characters, and
leading, trailing, and double blanks are deleted.)